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Abstract. In this paper, we study turn-based quantitative multiplayer non zero-sum 
games played on finite graphs with both reachability and safety objectives. In this 
framework a player with a reachability objective aims at reaching his own goal as soon 
' as possible, whereas a player with a safety objective aims at avoiding his bad set or, 

, if impossible, delaying its visit as long as possible. We prove the existence of Nash 

equilibria with finite memory in quantitative multiplayer reachability /safety games. 
Moreover, we prove the existence of finite-memory secure equilibria for quantitative 
■ two-player reachability games. 
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QQ ' General framework. The construction of correct and efficient computer systems (hardware or 

. software) is recognized as an extremely difficult task. To support the design and verification of 

such systems, mathematical logic, automata theory 10 and more recently model-checking [7] 
. have been intensively studied. The model-checking approach, which is now an important part 

OA ' of the design cycle in industries, has proved its efficiency when applied to systems that can 

be accurately modeled as a finite-state automaton. In contrast, the application of these tech- 
niques to computer software, complex systems like embedded systems or distributed systems 
has been less successful. This could be partly explained by the following reasons: classical 
automata-based models do not faithfully capture the complex interactive behavior of modern 
, computational systems that are usually composed of several interacting components, also in- 

teracting with an environment that is only partially under control. Recent research works show 
that it is suitable to generalize automata models used in the classical approach to verification, 
with the more flexible and mathematically deeper game-theoretic framework [14115) . 

Game theory meets automata theory. The basic framework that extends computational mod- 
els with concepts from game theory is the so-called two-player zero-sum games played on 
graphs [5 • Many problems in verification and design of reactive systems can be modeled with 
this approach, like modeling controller-environment interactions. Given a model of a system 
interacting with a hostile environment, given a control objective (like preventing the system 
to reach some bad configurations), the controller synthesis problem asks to build a controller 
ensuring that the control objective is enforced whatever the environment will do. Two-player 
zero-sum games played on graphs are adequate models to solve this problem |16| . Moves of 
Player 1 model actions of the controller whereas moves of Player 2 model the uncontrollable 
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actions of the environment, and a winning strategy for Player 1 is an abstract form of a control 
program that enforces the control objective. 

The controller synthesis problem is suitable to model purely antagonist interactions be- 
tween a controller and a hostile environment. However in order to study more complex sys- 
tems with more than two components whose objectives are not necessarily antagonist, we 
need multiplayer and non zero-sum games to model them adequately. Moreover, we do not 
look for winning strategies, but rather try to find relevant notions of equilibria, for instance 
the famous notion of Nash equilibria [M]. On the other hand, only qualitative objectives have 
been considered so far to specify, for example, that a player must be able to reach a target set 
of states in the underlying game graph. But, in line with the previous point, we also want to 
express and solve games for quantitative objectives such as forcing the game to reach a par- 
ticular set of states within a given time bound, or within a given energy consumption limit. In 
summary, we need to study equilibria for multiplayer non zero-sum games played on graphs 
with quantitative objectives. This article provides some new results in this research direction. 

Related work. Several recent papers have considered two-player zero-sum games played on 
finite graphs with regular objectives enriched by some quantitative aspects. Let us mention 
some of them: games with finitary objectives [6], games with prioritized requirements [1], 
request-response games where the waiting times between the requests and the responses are 
minimized |llll7j . and games whose winning conditions are expressed via quantitative lan- 
guages [2]. 

Other works concern qualitative non zero-sum games. The notion of secure equilibrium, 
an interesting refinement of Nash equilibrium, has been introduced in [5] • It has been proved 
that a unique secure equilibrium always exists for two-player non zero-sum games with reg- 
ular objectives. In [^, general criteria ensuring existence of Nash equilibria, subgame perfect 
equilibria (resp. secure equilibria) are provided for n-player (resp. 2-player) games, as well as 
complexity results. 

Finally, we mention reference [3] that combines both quantitative and non zero-sum as- 
pects. It is maybe the nearest related work compared to us, however the framework and the 
objectives are pretty different. In [3], the authors study games played on graphs with terminal 
vertices where quantitative payoffs are assigned to the players. These games may have cycles 
but all the infinite plays form a single outcome (like in chess where every infinite play is a 
draw). That paper gives criteria that ensure the existence of Nash (and subgame perfect) 
equilibria in pure and memory less strategies. 

Our contribution. We here study turn-based quantitative multiplayer non zero-sum games 
played on finite graphs with reachability objectives. In this framework each player aims at 
reaching his own goal as soon as possible. We focus on existence results for two solution 
concepts: Nash equilibrium and secure equilibrium. We prove the existence of finite-memory 
Nash (resp. secure) equilibria in n-player (resp. 2-player) games. Moreover, we prove that 
given a Nash (resp. secure) equilibrium of a n-player (resp. 2-player) game, we can build 
a finite-memory Nash (resp. secure) equilibrium of the same type, i.e. preserving the set of 
players achieving their objectives. For the case of Nash equilibria, we extend our results in 
two directions. First we prove that finite-memory Nash equilibria still exist when the model is 
enriched by allowing n-tuples of non- negative costs on edges (one cost by player). This result 
provides an answer to a question we posed in [4]. Secondly, we prove the existence of Nash 
equilibria in quantitative games where both safety and reachability objectives coexist. 



Our results are not a direct consequence of the existing results in the qualitative framework, 
they require some new proof techniques. To the best of our knowledge, this is the first general 
result about the existence of equilibria in quantitative niultiplayer games played on graphs. 

Organization of the paper. Section [2] is dedicated to definitions. We present the games and 
the equilibria we study. In Section [3] we first prove an existence result for Nash equilibria 
and provide the finite-memory characterization. Similar results concerning secure equilibria in 
two-player games are established in Section |4l Finally, in Section [SJ we discuss the extensions 
of our results on Nash equilibria. 

A part of these results has been published in W , namely the existence of finite-memory Nash 
(resp. secure) equilibria in multiplayer (resp. 2-player) games, and the fact that given a Nash 
equilibrium we can build a finite-memory Nash equilibrium of the same type. Additionally in 
this paper we give proofs of the previous results and we extend our existence result for Nash 
equilibria in the two directions mentioned above, namely (z) n-tuples of non-negative costs on 
edges and (ii) reachability/safety objectives. Moreover, in the two-player case, we prove that 
given a secure equilibrium, we can build a finite-memory secure equilibrium of the same type. 

2 Preliminaries 
2.1 Definitions 

We consider here quantitative games played on a graph where all the players have reachability 
objectives. It means that, given a certain set of vertices Goali, each player i wants to reach one 
of these vertices as soon as possible. 

This section is mainly inspired by reference [9]. 

Definition 1. An infinite turn-based quantitative multiplayer reachability game is a tuple 
Q = {n,V, {Vi)^^^,vo, E, {Goa\i)i^n) where 

• n is a finite set of players, 

• G = {V, {Vi)ien , vo, E) is a finite directed graph where V is the set of vertices, {Vi)i£n 
is a partition of V into the state sets of each player, vq ^ V is the initial vertex, and 
E <ZV X V is the set of edges, and 

• Goali C_V is the goal set of player i. 

We assume that each vertex has at least one outgoing edge. The game is played as follows. 
A token is first placed on the vertex vq. Player i, such that Vq € Vi, has to choose one of the 
outgoing edges of Vq and put the token on the vertex t^i reached when following this edge. 
Then, it is the turn of the player who owns Vi . And so on. 

A play p G (resp. a history h G V~^) of Q is an infinite (resp. a finite) path through the 
graph G starting from vertex vq. Note that a history is always non empty because it starts 
with Vq. The set H C 1/+ is made up of all the histories of Q. A prefix (resp. proper prefix) p 
of a history h = Hq . . . hk is a finite sequence ho ... hi, with I < k (resp. I < k), denoted by 
p < h (resp. p < h). We similarly consider a prefix p of a play p, denoted by p < p. 

We say that a play p = poPi ■ ■ ■ visits a set S C V (resp. a vertex v € V) if there exists 
I e N such that pi is in S (resp. pi = v). The same terminology also stands for a history h. 
Similarly, we say that p visits S after (resp. in) a prefix pQ . . . pk if there exists I > k (resp. 
I < k) such that pi is in S. For any play p we denote by Visit(/9) the set of players i ^ 11 



^ The general case of reachability /safety objectives is handled in Subsection l5.ll 



such that p visits Goali. The set Visit(/i) for a history h is defined similarly. The function Last 
returns, given a history h = ho ■ ■ ■ hk, the last vertex hk of h, and the length \h\ of h is the 
number k of its erf^ecl- 

For any play p = paPi ... of 5, we note Costi{p) the cost of player ?, defined by: 



We note Cost(p) = (Costi(p))ig/7 the cost profile for the play p. The aim of each player i is to 
minimize the cost he has to pay, i.e. reach his goal set Goali as soon as possible. 

A strategy of player z in is a function a : V*Vi ^ V assigning to each history hv ending 
in a vertex v of player z, a next vertex a{hv) such that (u, a{hv)) belongs to E. We say that a 
play /9 = popi ... of is consistent with a strategy cr of player i if p^+i = cr(po • ■ ■ Pfc) for all 
/c G N such that Pk ^Vi. The same terminology is used for a history of t?. A strategy profile 
oi Q is & tuple {ai)i^n where ai is a strategy for player i. It determines a unique play of Q 
consistent with each strategy ai, called the outcome of {cri)i^n and denoted by {{(7i)i^n)- 

A strategy a of player i is memoryless if cr depends only on the current vertex, i.e. a{hv) — 
a{v) for all ft, G -ff and w G V^. More generally, cr is a finite-memory strategy if the equivalence 
relation «cr on H defined by h k,^ h' if a{h5) — a{h'5) for all 5 G V*Vi has finite index. 
In other words, a finite-memory strategy is a strategy that can be implemented by a finite 
automaton with output. A strategy profile {cTi)i£n is called memoryless or finite-memory if 
each Ui is a memoryless or a finite-memory strategy, respectively. 

For a strategy profile {ai)i(^n with outcome p and a strategy cr^- of player j [j G iT), we 
say that player j deviates from p after a prefix h of p ii there exists a prefix h' of p such that 
h < h' , h' is consistent with cr^ and cr'j[h') ^ aj{h'). We also say that player j deviates from p 
just after a prefix h of p li h is consistent with cr^ and cr^ (/i) 7^ crj{h). 

We now introduce the notion of Nash equilibrium and secure equilibrium. 

Definition 2. A strategy profile {<yi)i£n of a game Q is a Nash equilibrium if for all player j G 
n and for all strategy a'^ of player j , we have: 

Costj{p) < Costj{p') 

where p = {{cri)ien) and p' = (ct^-, (cri)ig77\{i})- 

This definition means that player f (for all j G 77) has no incentive to deviate since he 
increases his cost when using cr^ instead of Oj. Keeping notations of Definition [2] in mind, a 
strategy a'^ such that Costj(p) > Costj(p') is called a profitable deviation for player j with 
respect to (ai)ien- In this case either player f pays an infinite cost for p and a finite cost for 
p' {p' visits Goalj, but p does not), or player j pays a finite cost for p and a strictly lower cost 
for p' {p' visits Goalj earlier than p does). 

As our results on secure equilibria stand for two-player games, we define this notion only 
in this context. In order to define the concept of secure equilibriuir|f| we first need to associate 
two appropriate binary relations ^1 and on cost profiles with player 1 and 2 respectively. 
Given two cost profiles {xi,X2) and (2/1,2/2): 



Note that the length is not defined as the number of vertices. 
^ Our definition naturally extends the notion of secure equilihrium proposed in [5] to the quantitative 
reachability framework. A longer discussion comparing the two notions can be found in Section [521 




I if Hs the least index such that p; G Goal 
-|-oo otherwise. 



(a;i,X2) -(1 (2/1,2/2) iff (xi > 2/1) V (a;i = 2/1 A a;2 < 2/2) ■ 



We then say that player 1 prefers (2/1,2/2) to {xi,X2). In other words, player 1 prefers a cost 
profile to another either if he can decrease his own cost, or if he can increase the cost of 
player 2, while keeping his own cost. We define the relation -<2 symmetrically. 

Definition 3. A strategy profile ((Ti,(T2) of a two-player game Q is a secure equilibrimn if 

there does not exist any strategy a[ of player 1 such that: 

Cost(p) ^1 Cost(p') 

where p — {01,02) and p' = {<j'i,iJ2), and there does not exist any strategy 02 of player 2 such 
that: 

Cost(p) ^2 Cost(p') 

where p = {01,02) and p' — {01,02). 

In other words, player 1 (resp. 2) has no incentive to deviate, with respect to the relation ^1 
(resp. ^2)- Note that any secure equilibrium is a Nash equilibrium. A strategy o'^ such that 
Cost(yo) -<j Cost(/9') is called a ^j-profitable deviation for player j with respect to {01,02) (for 

ie{i,2}). 

Let us go back to the multiplayer framework and define the notion of type of an equilibrium. 

Definition 4. The type of a strategy profile {oi)i^n in a reachability game Q is the set of 
players j (z U such that the outcome p of {oi)i^]j visits Goalj. It is denoted by Type(((Ti)jg/7). 

In other words, Type((cri)igj7) = Visit(/9). 

The previous definitions arc illustrated in the following example. 

Example 5. Let Q = (V, Vi, V2, wo, Goali, Goab) be the two-player game depicted in Fig- 
ure [TJ The states of player 1 (resp. 2) are represented by circles (resp. squares j3. Thus, ac- 
cording to Figure [H Vi = {A, C, D} and V2 = {B}, the initial vertex wq is the vertex A, and 
we set Goali = {C} and Goa^ = {D}- 




Fig. 1. A two-player game with Goali = {C} and Goab = {D}. 

An example of play in Q is given by p = {AD)'^ , which visits Goal2 but not Goali, leading 
to the cost profile Cost{{AD)'^) = {+00, 1). The play p is, among others, the outcome of the 
strategjH profile (oi, 02) where oi{hA) — D and 02{hB) — C, for all histories h. 

^ We will keep this convention through the article. 

^ Note that player 1 has no choice in vertices C and D, that is, ai{hv) is necessarily equal to A 
for V G {C,D}. 



Let us show that the strategy profile (cri,(T2) is not a Nash equilibrium, by proving that 
player 1 has a profitable deviation a'l in which he manages to decrease his own cost. With 
a[ defined by a[{hA) = B, we get the play {a[,a2) = (ABC)" such that Cost((ylBC)") = 
(2, +00), and in particular Costi {{ABC)'^) < Costi(p). 

On the opposite side, one can show that ((75^,172) is a Nash equilibrium. However {a'i,a2) 
is not a secure equilibrium. Indeed, player 2 has a -<2-profitable deviation in which he can 
increase player I's cost without modifying his own cost. With ctj the strategy of player 2 defined 
by a^ihB) = A, we get the play {a^.a'^) = {ABY such that CosX.((ABY) = (+00, +00), and 
Cost((cr;,cr2)) <2 Cost((CT'i,cr^)). 

Notice that all strategies discussed so far are memoryless. In order to obtain a Nash equi- 
librium of type {1,2}, finite- memory strategies are necessary. We define the following finite- 
memory strategy profile (ti,T2): 



The outcome tt = ((ri, T2)) is equal to AD{ABC)'^ and has costs (4, 1). In order to prove that 
('''1, T2) is a Nash equilibrium, we prove that no player has a profitable deviation. For player 2 
it is clearly impossible to get a cost less than 1. To try to get a cost less than 4, player 1 
must use a strategy t[ such that t[{A) — B. But then player 2 chooses T2{AB) — A. The 
prefix ABA of the outcome of (t{, T2) shows that player 1 will increase his cost of 4. 

However (ri,T2) is not a secure equilibrium since player 2 has a ^2-profitable deviation 
such that T2{hB) = A for all histories h. One can show that, in this example, there is no secure 
equilibrium of type {1,2}. 

The questions studied in this article are the following ones: 

Problem 1 Given Q a quantitative multiplayer (resp. two-player) reachability game, does 
there exist a Nash equilibrium ( resp. a secure equilibrium ) in Q? 

Problem 2 Given a Nash equilibrium (resp. a secure equilibrium) in a quantitative multi- 
player (resp. two-player) reachability game Q, does there exist a finite-memory Nash equilib- 
rium (resp. secure equilibrium) with the same type? 

We provide positive answers in Sections [3] and SI Notice that these problems have been inves- 
tigated in the qualitative framework (see [9j). 

2.2 Qualitative Games vs Quantitative Games 

We show in this section that Problems [1] and [2] can not be reduced to problems on qualitative 
games. 

Given a quantitative multiplayer reachability game Q, one can naturally define a qualitative 
version of G, denoted by Q, such that the payofffH are qualitative. Given a play p of G, the 
qualitative payoff of player i is defined by: 



^ For qualitative games, we use the notion of payoff rather than the notion of cost since Win (resp. 
Lose) can be seen as a payofT of 1 (resp. 0) and the aim of the players is to maximize their payoffs. 





We note Payoff(p) — (Payoffj(p))ig/7 the qualitative payoff profile for the play p. In this 
framework, player i aims at reaching his own goal set, i.e. at obtaining payoff Win. With this 
idea in mind, one can naturally adapt the notion of Nash (resp. secure) equilibrium to the 
qualitative framework. 

The existence of Nash (resp. secure) equilibria in n-player (resp. 2-player) qualitative 
games G has been proved in [21 Corollary 12] (resp. (51 Theorem 2]) for reachability objectives, 
and more generally for Borel objectives. 

The next example illustrates that lifting Nash equilibria in Q to Nash equilibria in G does 
not work. We developed new ideas in Sections [31 and [H to solve Problem [T] 

Example 6. Let us now consider the two-player game G depicted in Figure[2l such that Goali = 
{B, E} and Goab = {C}. Notice that only player 1 effectively plays in this game. We are going 
to exhibit a secure (and thus Nash) equilibrium {ai,a2) in the qualitative game G that can 
not be lifted neither to a secure nor to a Nash equilibrium in the quantitative game G- The 
strategy profile {(Ji,a2) is defined such that (((Ti,(72)) = ADE" . It is a secure equilibrium 
in G with the qualitative payoff profile (Win, Lose). However {ai,a2) is not a Nash (and thus 
not a secure) equilibrium in G- Indeed, the play ABC^ provides a smaller cost to player 1, 
i.e. Costi (Ai?C") < Costi{ADE'^). Notice that in this example, there is no equilibrium in G 
of type {1}. 




Fig. 2. A game Q with an equilibrium in Q that can not be lifted to Q. 



The next proposition shows that on the opposite side, any Nash equilibrium in a quanti- 
tative game G can be lifted to a Nash equilibrium in the qualitative game G- 

Proposition 7. // {ai)i£n is a Nash equilibrium in a quantitative multiplayer reachability 
game G, then {ai)i^n is also a Nash equilibrium in G- 

Proof. For a contradiction, let us assume that in G, player j has a profitable deviation a'^ 
w.r.t. {(Ji)ien- This is only possible if Payoff j (((cri)ig/7)) = Lose and Payoff^ ((cr^ , (cri)ie77\{i})) = 
Win. Thus when playing a'j against player j manages to visit Goalj. Clearly 

enough, cr^ would also be a profitable deviation w.r.t. {ai)ii^n in G, contradicting the hypoth- 
esis. □ 

Note that Proposition [71 is false for secure equilibria. To see that, let us come back to 
the game G of Figured The strategy profile (cri,(T2) such that (iTi,cr2) — ABC^ is a secure 
equilibrium in the quantitative game G but not in the qualitative game G- 

2.3 Unraveling 

In the proofs of this article we need to unravel the graph G ~ (V, {Vi)ii^n tV^^E) from the 
initial vertex vq, which ends up in an infinite tree, denoted by T. This tree can be seen as a 



new graph where the set of vertices is the set H of histories of Q, the initial vertex is vq, and 
a pair {hv, hvv') G x 77 is an edge of T if {v,v') € E. A history h is a. vertex of player i 
in T if Last(/i) G Vi, and it belongs to the goal set of player i if Last(/i) G Goal^. 

We denote by T the related game. This game T played on the unraveling T of G is equiv- 
alent to the game Q that is played on G in the following sense. A play {po){poPi){poPiP2) ■ ■ ■ 
in T induces a unique play p ~ P0P1P2 ■ ■ ■ in t/, and conversely. Thus, we denote a play in T 
by the respective play in Q. The bijection between plays of Q and plays of T allows us to use 
the same cost function Cost, and to transform easily strategies in Q to strategies in 7" (and 
conversely). 

We also need to study the tree T limited to a certain depth d > 0: we note Trunc£;(T) 
the truncated tree of T of depth d and TrunCd(T) the finite game played on TrunCd{T). More 
precisely, the set of vertices of Truncd(r) is the set of histories /i G i? of length < d; the edges 
of Truncd(r) are defined in the same way as for T except that for the histories h of length d, 
there exists no edge {h,hv). A play p in Truncd(T) corresponds to a history of Q of length 
equal to d. The notions of cost and strategy are defined exactly like in the game T, but limited 
to the depth d. For instance, a player pays an infinite cost for a play p (of length d) if his goal 
set is not visited by p. 

2.4 Qualitative Tvifo-player Zero-sum Reachability Games 

In this section we recall well-known properties of qualitative two-player zero-sum reachability 
games ,8, Chapter 2]. This will be necessary in our proofs. 

Definition 8. A qualitative two-player zero-sum reachability game is a tuple Q = {V, Vi, V2, -E, Goal) 

where 

• G = {V,Vi,V2, E) is a finite directed graph where V is the set of vertices, Vi,V2 is a 
partition of V into the state sets of player 1 and player 2, and E <ZV xV is the set of 
edges, 

• Goal C_ V is the goal set of player 1. 

Given an initial vertex vq G V , the notions of play, history and strategy are the same as 
the ones defined in Section [53] Player 1 (resp. player 2) wins a play p oi Q \i p visits Goal 
(resp. p does not visit Goal). The game is said zero-sum because every play is won by exactly 
one of the two players. 

In zero-sum games, it is interesting to know if one of the players can play in such a way 
that he is sure to win, however the other player plays. We can formalize this by introducing 
the notion of winning strategy. A strategy cTi for player i is a winning strategy from an initial 
vertex v if all plays of Q starting in v that are consistent with are won by player i. If player i 
has a winning strategy in Q from v, we say that player i wins the game G from v. We say that 
a game G is determined if for all v V, one of the two players has a winning strategy from v. 

Martin showed |13| that every qualitative two-player zero-sum game with a Borel type 
winning condition is determined. In particular, we have the following proposition: 

Proposition 9 ([8j). Let Q = (V, V^i, V2, -E, Goal) he a qualitative two-player zero-sum reach- 
ability game. Then for all v ^ V , one of the two players has a memoryless winning strategy 
from V (in particular, Q is determined) . 

Moreover for all vertices v from which he wins the game, player 1 (resp. player 2) has a 
memoryless strategy that is independent of v and that forces the play to visit Goal within at 
most \V\ — 1 edges (resp. to stay inV\ Goal ). 



3 Nash Equilibria 



From now on we will often use the term game to denote a quantitative multiplayer reachability 
game according to Definition [TJ 

3.1 Existence of a Nash Equilibrium 

In this section we positively solve Problem [T] for Nash equilibria. 

Theorem 10. In every quantitative multiplayer reachability game, there exists a finite-memory 
Nash equilibrium. 

The proof of this theorem is based on the following ideas. By Kuhn's theorem fTheorem fTTj) , 
there exists a Nash equilibrium in the game TrunCd(T) played on the finite tree TrunCd(T), for 
any depth d. By choosing an adequate depth d, Proposition [T3] enables to extend this Nash 
equilibrium to a Nash equilibrium in the infinite tree T, and thus in Q. Let us detail these 
ideas. 

We first recall Kuhn's theorem |12| . A preference relation is a total reflexive transitive 
binary relation. 

Theorem 11 (Kuhn's theorem). Let F be a finite tree and Qr a game played on F. For 
each player i S n , let be a preference relation on cost profiles. Then there exists a strategy 
profile {o'i)i,=n such that for every player j G 77 and every strategy a'^ of player j in Qr we 
have 

Cost(p') ;<j Cost(p) 

where p = ((crOie/i) and p' = {a'^, {cr.,)if=n\{f}) ■ 

Note that Cost(p') Cost(p) means that player j prefers the cost profile of the play p 
than the one of p', or they are equivalent for him. 

Corollary 12. Let Q be a game and T be the unraveling of G. Let TrunCc;(T) be the game 
played on the truncated tree of T of depth d, with d> 0. Then there exists a Nash equilibrium 
in TrunC(j(T). 

Proof. For each player j d n , we define the relation on cost profiles in the following way: 
let {xi)i^n and {yi)i^n be two cost profiles, we say that {xi)i^n '^j {yi)i^n iff Xj > yj. It is 
clearly a preference relation which captures the Nash equilibrium. The strategy profile {a'i)ien 
of Kuhn's theorem is then a Nash equilibrium in TrunC(j(7~). □ 

Proposition [13] states that it is possible to extend a Nash equilibrium in TwnCd{T) to 
a Nash equilibrium in the game T, if the depth d is equal to (|7T| + 1) ■ 2 • \V\. We obtain 
Theorem [TO] as a consequence of Corollary [T^ and Proposition [TO! 

Proposition 13. Let Q be a game and T be the unraveling of G. Let Truncrf(7~) be the game 
played on the truncated tree ofT of depth d = (|77l + l)-2-|y|. If there exists a Nash equilibrium 
in the game TrunCc;(7~^, then there exists a finite-memory Nash equilibrium in the game 7". 

The proof of Proposition [13] roughly works as follows. Let {cFi)i£n be a Nash equilibrium 
in TrunCc;(7~). A well-chosen prefix a/3, with /3 being a cycle, is first extracted from the out- 
come p of {ui)i^n- The outcome of the required Nash equilibrium (Ti)i^n in T will be equal 
to ajS'^ . As soon as a player deviates from this play, all the other players form a coalition to 



punish him in a way that this deviation is not profitable for him. These ideas are detailed in 
Lemmas [15] and 1161 One can see Lemma [15] as a technical result used to prove Lemma I16[ 
which is the main ingredient to show Proposition [13] The proof of Lemma [15] relies on a 
particular case (stated below) of Proposition [9] More precisely, we consider the qualitative 
two-player zero-sum game Qj played on the graph G, where player j plays in order to reach 
his goal set Goalj, against the coalition of all other players that wants to prevent him from 
reaching his goal set. Player j plays on the vertices from Vj and the coalition on \ V, . 

Proposition 14 ([8]). Let Qj = {V,Vj,V \Vj,E, Goal j) be the qualitative two-player zero- 
sum reachability game associated to player j . Then player j has a memoryless strategy Vj that 
enables him to reach Goalj within — 1 edges from each vertex v from which he wins the 
game Qj . On the contrary, the coalition has a memoryless strategy V-j that forces the play to 
stay in V \ Goalj from each vertex v from which it wins the game Qj. 

Lemma 15. Suppose d > 0. Let {aiju^n be a Nash equilibrium in TrunC(i(7~) and p the (finite) 
outcome of {ai)ien- Assume that p has a prefix af3j, where j3 contains at least one vertex, 
such that 



for some l>\. 

Let j £ LI be such that a does not visit Goalj. Consider the qualitative two-player zero-sum 
game Qj = {V, Vj ,V \ Vj , E , Goalj). Then for all histories hu of Q consistent with {'^i)ien\{j} 
and such that \hu\ < \o:/3\, the coalition of the players i ^ j wins the game Qj from u. 

Condition Visit(a) = Visit(a/37) means that if Goali is visited by a/37, it has already been 
visited by a. Condition Last(a) — Last(a/J) means that /3 is a cycle. The play p of Lemma [TSl 
is illustrated in Figure [3] 

Lemma [K] says in particular that the players i ^ j can play together to prevent player j 
from reaching his goal set Goalj, in case he deviates from the play a/3 (as afi is consistent 
with {ai)i^n\{j})- We denote by v-j the memoryless winning strategy of the coalition. For 
each player i ^ j, let be the memoryless strategy of player i va Q induced by v-j. 

Proof (of Lemma [75)) . By contradiction suppose that player j wins the game Qj from u. By 
Proposition [14] player j has a memoryless winning strategy Vj which enables him to reach 
his goal set Goalj within at most \V\ — 1 edges from u. We show that Vj leads to a profitable 
deviation for player j w.r.t. {ai)i^n in the game Truncd(T), which is impossible by hypothesis. 

Let p' be a play in Truncrf(7~) such that hu is a prefix of p' , and from u, player f plays 
according to the strategy Vj and the other players i ^ j continue to play according to Oi . As 
the play p' is consistent with the memoryless winning strategy Vj from u, it visits Goalj and 
we have 



Visit(a) = Visit(a^7) 
Last(a) = Last(a/3) 

|a^| < / • |V^| 



a/37| = (/ + 1) • |F| 



Costj(p') < \hu\ + \V\ 
<{l + \)-\V 
< d 



(by Proposition [T4| 
(by hypothesis) 
(as a/37 < p). 



We consider the following two cases. If Costj(p) ~ +cxd (i.e. p does not visit Goalj), we 
have 

C0Stj(/9') < CoStj(p) = +CX). 

On the contrary, if Costj(p) < +cxd (i.e. p visits Goalj, but after the prefix a/37 by hypothesis), 
then we have 

Costj(p') < C0Stj(/9) 

as Costj(p) > (? + 1) • \V\. 

Since p' is consistent with (<Ji)i^n\{j}^ the strategy of player j induced by the play p' is a 
profitable deviation for player j w.r.t. {<yi)ien in both cases, which is a contradiction. □ 

Now that we have proved Lemma I15i we use it in order to obtain Lemma 1161 which 
states that one can define a Nash equilibrium {Ti)i^ij in the game T, based on the Nash 
equilibrium {ai)i^n in the game 'Tru<nC(i[T). 

Lemma 16. Suppose d > 0. Let {(Ji)i^n be a Nash equilibrium in Trunc^(7~) and al3"f be a 
prefix of p = {{<Ji)i(£n) «s defined in Lemma \T5l Then there exists a Nash equilibrium {Ti)i^]j 
in the game T . Moreover (Ti)i^Yi is finite-memory, and Type((Ti)jg/7) = Visit(Q!). 

Proof. Let us set U = {1, . . . , n}. As a and /3 end in the same vertex, we can consider the 
infinite play a/3" in the game T. Without loss of generality we can order the players i e 77 so 
that 

yi < k Costi(a/3") < +00 (a visits Goali) 

\/i > k Costi(a/3") = +00 (a does not visit Goal;) 

where < fc < n. In the second case, notice that p could visit Goal^ (but after the prefix a/37). 

The Nash equilibrium {Ti)i^]j required by Lemma [TBI is intuitively defined as follows. First 
the outcome of {Ti)i^n is exactly a/3". Secondly the first player j who deviates from a/3" is 
punished by the coalition of the other players in the following way. If j < and the deviation 
occurs in the tree Truncd(r), then the coalition plays according to {(Ji)i^n\{j} in this tree. 
It prevents player j from reaching his goal set Goalj faster than in a/3". And if j > k, the 
coalition plays according to {i^i,j)ien\{j} (given by Lemma [T5|) so that player j does not reach 
his goal set at all. 

We begin by defining a punishment function P on the vertex set H oi T such that P{h) 
indicates the first player j who has deviated from a/3", with respect to h. We write P(h) = _L if 
no deviation has occurred. For vq, we define P{vq) — 1. and for h G such that Last(/i) G Vi 
and V G V, we let: 

( _L if P{h) = _L and hv < a/3", 
P{hv) ^ <i if Pih) = _L and hv ^ a/3", 
[ P{h) otherwise {P{h) ^ _L) . 

The Nash equilibrium (ri)igj7 is then defined as follows: let h he a, history ending in a 



vertex of Vi, 



n{h) 



V if P{h) = ± {h< a/3"); such that hv < a/3", 

arbitrary if P{h) — i, 

Vi,p(h){h) if P{h) ^ ^,i and P{h) > fc, (1) 

ai{h) if P{h) ^ ±, i, P{h) < k and \h\ < d, 

arbitrary otherwise {P{h) 7^ L,i, P{h) < k and \h\ > d) 



where arbitrary means that the next vertex is chosen arbitrarily (in a memoryless way) . Clearly 
the outcome of {Ti)i^n is the play a/3", and Type((Ti)ig/7) is equal to Visit(Q!) (= Visit(a/3)). 

It remains to prove that {Ti)i(z]j is a finite-memory Nash equilibrium in the game T. We 
first show that the strategy profile {Ti)i£n defined in Equation (JlJ is a Nash equilibrium in 
the game T. Let rj be a strategy of player j. We show that this is not a profitable deviation 
for player j w.r.t. {Ti)i^n- We distinguish the following two cases: 

(i) j < k (Costj(Q!/3") < +00, a visits Goalj). 

To improve his cost, player j has no incentive to deviate after the prefix a. Thus we assume 
that the strategy rj causes a deviation from a vertex visited in a. By Equation ([l} the 
other players first play according to i<yi)ien\{j} in Truncd(7'), and then in an arbitrary 
way. 

Suppose that rj is a profitable deviation for player j w.r.t. {Ti)i^n in the game T. Let us 
set TT = {{n)ten) and vr' = (rj, iTi)i(zn\{j}). Then 

Costj(7r') < Costj(7r). 

On the other hand we know that 

Costj(7r) = Cost j{p) < 

So if we limit the play it' in T to its prefix of length d, we get a play p' in TrunCc((T) such 
that 

Costj(p') = Costj(7r') < Costj(p). 

As the play p' is consistent with the strategies io'i)iizn\{j} by Equation ([T]), the strategy rj 
restricted to the tree TrunCd(T) is a profitable deviation for player j w.r.t. {cTiji^u in the 
game TrunCd{T). This contradicts the fact that {(Jijien is a Nash equilibrium in this game. 

{a) j > k (Costj(a/3") = +00, af3'^ does not visit Goalj). 

If player j deviates from a/3" (with the strategy rj), by Equation ([T]) the other players 
combine against him and play according to v^j . By Lemma [15] this coalition wins the 
game Gj from any vertex visited by a/3". So the strategy of the coalition keeps the 
play (rj, {Ti)i^n\{j}) away from the set Goalj, whatever player j does. Therefore rj is not 
a profitable deviation for player j w.r.t. {Ti)i^n in the game T. 

We now prove that {Ti)i^n is a finite-memory strategy profile. According to the definition 
of finite-memory strategy (see Section [5]) we have to prove that each relation on H has 
finite index (recall that h k.^.. h! if Ti{h5) — Ti{h'S) for all S G V*Vi). In this aim we define for 
each player i an equivalence relation with finite index such that 

V/i, h' eH, h h' h'. 

We first define an equivalence relation with finite index related to the punishment 
function P. For all prefixes ft., h' of a/3", i.e. such that no player is punished, this relation 
does not distinguish two histories that are identical except for a certain number of cycles /3. 
For the other histories it just has to remember the first player, say i, who has deviated. The 
definition of is as follows: 

hr^ph' if ft = a/3'^', ft' = a/3^/3', /3' < /3, /, m > 

ftu ~P h'v' if V, v' eVi, h, ft' < a/3", but hv, h'v' ^ a/3" 

hvr^phvS if h < a^'^ , hv-^ , S e V* . 



The relation ~p is an equivalence relation on H with finite index. 

We now turn to the definition of It is based on the definition of (given in ([T|)) 
and ^p. To get an equivalence with finite index we proceed as follows. Recall that each 
strategy Vi^p[h) is memoryless and when a player plays arbitrarily, his strategy is also me- 
mory less. Furthermore notice that, in the definition of t^, the strategy Ui is only applied to 
histories h with length \h\ < d. For histories h such that Ti{h) = v with hv < af3'^ , it is enough 
to remember information with respect to a/3 as already done for ^p. Therefore for h,h' Cz H 
we define in the following way: 

h r^r, h' if h -p h' and {P{h) = ± 

or P{h) = i and Last(/i) = Last(ft,') 
or P{h) ^ ±,i, P(h) > k and Last(/i) = Last(/i') 
or P{h) ^ ±, i, P{h) < k, \h\, \h'\ > d and 
Last(/i) = Last(/i')). 

Notice that this relation satisfies 

h r^r, h' ^ n{h) = Ti{h') and Last(/i) = Last(/i') 

and has finite index. Therefore if h h' , then h h' and the relation sa,-. has finite index. 

□ 

We can now proceed to the proof of Proposition [T51 

Proof (of Provosition\13\) . Let us set 7T = {1, . . . , n} and d = (n + 1) • 2 • \V\. Let ((Ti)ig/7 be 
a Nash equilibrium in the game Truncd(7") and p its outcome. 

To be able to use Lemma [TBI we consider the prefix pq of p of minimal length such that 

3;>l |p| = (/-i).|y| 

|pq| = a + l)-|V^I 

Visit(p) = Visit(pq) . (2) 

The following statements are true. 

• / < 2-ri + l. 

• If Visit(p) ^ Visit(p), then I <2-n + l. 

Indeed the first statement results from the fact that in the worst case, the play p visits the 
goal set of a new player in each prefix of length i ■ 1 ■ \V\, 1 < i < i.e. |p| — n ■ 2 ■ \V\. It 
follows that pq exists as a prefix of p, because the length d of p is equal to (n + 1) • 2 • \V\ by 
hypothesis. Thus Visit(p) C \J\s\t{p). Suppose that there exists i G Visit(p) \ Visit(p), then p 
visit Goali after the prefix pq by Equation The second statement follows easily. 

Given the length of q, one vertex of V is visited at least twice by q. More precisely, we can 
write 

pq — a(3^ with Last(a) = Last(a/3) 

H>(/-i).|i^l 

\a/3\<l-\V\. 

In particular, |p| < \a\. See Figure[3l We have Visit(a) — Visit(Q/37), and \al3^\ = {I + 1) ■ \V\. 

As the hypotheses of Lemma [12] are verified, we can apply it in this context to get a finite- 
memory Nash equilibrium {Ti)ii=[j in the game T with Type((ri)ig7j) = Visit(a). □ 



Fig. 3. Slicing of the play p in the tree Trur\Cd{T). 



Proposition[T51 asserts that given a game Q and the game TrunCd(T) played on the truncated 
tree of T of a well-chosen depth d, one can lift any Nash equilibrium {ai)i^n of TrunCd(T) to 
a Nash equilibrium {Ti)i^n of G- The proof of Proposition [T3] states that the type of [riji^n 
is equal to Visit(Q!). We give an example that shows that it is impossible to preserve the type 
of the lifted Nash equilibrium {ai)i^n- 

Example 17. Let us consider the two-player game Q depicted in Figure H] with Goali = 
{C}, Goab — {E}. One can show that Q admits only Nash equilibria of type {2} or 0. 
Indeed, on one hand, there is no play of Q where both goals are visited, and on the other 
hand given a strategy profile {ai)i^n such that {{ai)i^n) visits Goali (i.e. {{(Ji)ien) is of the 
form A+BC"), playing D instead of C is clearly a profitable deviation for player 2. 

We will now see that for each d > 2 the game played on Truncd(r) admits a Nash equi- 
librium of type {!}. From the above discussion, this equilibrium can not be lifted to a Nash 
equilibrium of the same type in Q. A truncated tree Truncrf(r) is depicted in Figure [5j One 
can show that the strategy profile leading to the outcome A'^^^BC (depicted in bold in the 
figure) is a Nash equilibrium in TrunCd(T) of type {!}. Following the lines of the proof of 
Proposition [T31 we see that this Nash equilibrium is lifted to a Nash equilibrium of Q with 
outcome A" and type 0. 

On the other hand, notice that from the proof of Proposition [T51 we can construct a Nash 
equilibrium such that each player pays either an infinite cost, or a cost bounded by |7T| • 2 • 

3.2 Nash Equilibria with Finite Memory Preserving Types 

In this section we show that given a Nash equilibrium, we can construct another Nash equi- 
librium with the same type such that all its strategies are finite-memory. We then answer to 
Problem [2] for Nash equilibria. 

Theorem 18. // there exists a Nash equilibrium in a quantitative multiplayer reachability 
game Q , then there exists a finite-memory Nash equilibrium of the same type in Q . 

The proof is based on two steps. The first step constructs from {<Ji)i,^n another Nash 
equilibrium {Ti)i^[j with the same type such that the play {{Ti)i^n) is of the form a/?" 
with Visit(Q!) — Type((o'i)ig/7)- This is possible thanks to Lemmas [T9l and [20l by first elimi- 
nating unnecessary cycles in the play {{ai)i^]j) and then locating a prefix afi such that j3 is 
a cycle that can be infinitely repeated. 



Fig. 4. A game Q. 



Fig. 5. The truncated tree Trunc£j(r). 



The second step transforms the Nash equilibrium {Ti)i^n into a finite-memory one thanks 
to Lemma If 61 given in Section l3.fl For that purpose, we consider the strategy profile {Ti)i^]j 
limited to the tree T truncated at a well-chosen depth. 

The next lemma indicates how to eliminate a cycle in the outcome of a Nash equilibrium. 

Lemma 19. Let {ai)i,=n be a strategy profile in a game Q and p =^ {i<Ji)ien) its outcome. 
Suppose that p — pc\p, where q contains at least one vertex, such that 



Proof. Let us set 7J = {f , . . . , n}. We write 

P = {{<^i)ien) of cost profile {xi,..., Xn), 
TT = {{n)ien) of cost profile (yi, . . .,?/„)■ 

We observe that as p = pqp, we have tt = pp (see Figures [6] and [7]) . ft follows that 



Visit(p) = Visit(pq) 
Last(p) = Last(pq). 



We define a strategy profile {Ti)i^]j as follows: 




where h is a history of G with Last(ft,) e Vi. We get the outcome {(Ti)i^n) = PP- 

If a strategy rj is a profitable deviation for player j w.r.t. {Ti)i^n , then there exists a profitable 

deviation a'j for player j w.r.t. {<7i)i^n- 



yi en, yi < Xi. 



(3) 



More precisely, 



- if a:;^ = -l-oo, then yi = +oo; 

- if < -t-oo and i G Visit(p), then yi 

- if < -|-oo and i ^ Visit(p), then yi 



(4) 



x. -(|q| + l). 



(5) 



Pi P^((<Ti)ien) 



■^1 



Fig. 6. Play p and possible deviations. Fig. 7. Play n and possible deviations. 

Let rj be a profitable deviation for player j w.r.t. {Ti)i^]j, and tt' be the outcome of the 
strategy profile (rj, {n)i!zn\{j})- Then 

Costj(7r') < yj. 

We show how to construct a profitable deviation a'- for player j w.r.t. {(Ji)i!^n- Two cases 
occur: 

{i) player j deviates from tt just after a proper prefix /i of p (like for the play tt^ in Figure [7]). 

We define a'^ = rj and we denote by p' the outcome of {a'p {cri)i,=n\{j})- Given the defini- 
tion of the strategy profile {Ti)i^n, one can verify that p' — n' (see the play p[ in Figure[6]). 
Thus 

Costj(p') = Costj(7r') < Uj < Xj 

by Equation ([3]), which implies that a'^ is a profitable deviation of player j w.r.t. {(Ti)ien- 

(ii) player j deviates from tt after the prefix p (tt and tt' coincide at least on p). 

This case is illustrated by the play ttj in Figure [71 We define for all histories h ending in 
a vertex of Vj : 

(Jj{h) if pq ^ /i, 
rj(p(5) \ih = pq(5. 

Let us set p' — (ct^-, {<Ji)ii^n\{j}) ■ As player j deviates after p with the strategy rj, one can 
prove that 

tt' = p7r' and p' — pqTr' 

by definition of {Ti)i^n (see the play p'2 in Figure[6]). As Costj(7r') < yj, it means that j ^ 
Visit(p) (otherwise the deviation would not be profitable for player j). Since Visit(p) = 
Visit(pq), we also have 

Costj(7r') + (|q| + 1) = Costj(p'). 

By Equations ([4]) and (O, we get 

— either Xj = yj = +00 and Costj(p') < Xj, 

— or Xj = yj + (|q| + 1) and Costj(p') < Xj, 
which proves that cr' is a profitable deviation for player j w.r.t. {ai)iiz]j- □ 



a'Jh) 



While Lemma [T^ deals with elimination of unnecessary cycles, Lemma deals with rep- 
etition of a useful cycle. 

Lemma 20. Let {(Jija^n be a strategy profile in a game Q and p — {{cri)ien) its outcome. We 
assume that p = pc{p, where q contains at least one vertex, such that 

Visit(p) = Visit(p) 
Last(p) = Last(pq). 

We define a strategy profile {Ti)i^n as follows: 

T (h) = [ '^'^'^^ ^/P^/i. 

' {(JiipS) ifh = pq'^S, k <eN, and c\ ^ S 

where h is a history of Q with Last(ft,) G Vi. We get the outcome {{riji^n) = pq"- 

If a strategy t'^ is a profitable deviation for player j w.r.t. {Ti)iQn, then there exists a profitable 

deviation a'^ for player j w.r.t. (ai)i^n- 

Proof. We use the same notations as in the proof of Lemma 1191 Here we have Xi = yi for 
all i ^ n since Visit(p) — Visit(p). One can prove that tt = pq" (see Figures [5] and [1]) . 





P=<(o"i)ie£r> 



T=((ri)ierr) 



Fig. 8. Play p and its prefix pq. 



Fig. 9. Play tv = pq" 



We show how to define a profitable deviation <Tj from the deviation rj . We distinguish the 
following two cases: 

(i) player j deviates from tt just after a proper prefix h of pq. 

We define a'^ = rj. As in the first case of the proof of Lemma (THl we have Cost j{p') < Xj, 
which implies that a'^ is a profitable deviation of player j w.r.t. (cri)ig77- 
(a) player j deviates from tt after the prefix pq, i.e. after a prefix pq'' and strictly before the 
prefix pq''+^ (fc > 1). 

We define for all histories h ending in a vertex of Vj : 



if p ^ /i. 



T'ipq^S) ifh = p6. 



One can prove that 

tt' = pq'''7r' and p' — p7r'. 
And then, in the point of view of costs we have 

Costj((o') < Costj(7r') < j/j — Xj, 

which proves that a'^ is a profitable deviation for player j w.r.t. {ai)i^n- O 

The next proposition achieves the first step of the proof of Theorem [TH] as mentioned in 
Section r3.2l It shows that one can construct from a Nash equilibrium another Nash equilibrium 
with the same type and with an outcome of the form a/?". Its proof uses Lemmas [TOl and EOl 

Proposition 21. Let {(7i)ii=n be a Nash equilibrium in a game Q. Then there exists a Nash 
equilibrium (Ti)ig/7 with the same type and such that {{Ti)i^[j) = oiP^ , where Visit(Q;) — 
Type((a,).ei7) and \af3\ < (|7T| + 1) • \V\. 

Proof. Let us set 77 = {1, . . . , n} and p = {{(Ti)ien)- Without loss of generality suppose that 

Cost(/9) = (xi, . . . such that xi < . . . < Xk < +oo 

and Xk^i = . . . = Xn = +oo 

where Q < k < n. We consider two cases: 
(i) XI > \V\. 

Then, there exists a prefix pq of p, with q containing at least one vertex, such that 

|pq| < xi 

Visit(p) = Visit(pq) 
Last(p) = Last(pq). 

We define the strategy profile {Ti)i(zn as proposed in Lemma [T9l By this lemma it is 
actually a Nash equilibrium in Q. With n = ({Ti)i£n), we have 

p = pqp and tt — pp. 

Thus if the cost profile for the play tt is (j/i, . . . , y„), we have 

yi < xi,...,yk < Xk 

yk+l = Xk+l = +00, ...,yn=Xn = +00. 

(m) {xi+i - xi) > |y| for 1 < Z < fc - 1. 

Then, there exists a prefix pq of p, with q containing at least one vertex, such that 

xi < |pq| < xi+i 

Visit(p) = Visit(pq) = {l,...,l} 

Last(p) = Last(pq). 

We define the strategy profile {Ti)i^n given in Lemma [121 It is then a Nash equilibrium 
in Q, and for tt = {{Ti)i^n), we have 

p ~ pqp and tt = pp. 



Hence if the cost profile for the play tt is (yi, . . . , we have 

yi=xi,...,yi=xu 

yi+i < xi+i,. ■.,yk < Xk] 

Vk+i = Xk+i = +00, . . . , y„ = x„ = +00. 

By applying finitely many times the two previous cases, we can assume without loss of gener- 
ality that {(Ti)i(zij is a Nash equilibrium with a cost profile {xi, . . . , x„) such that 

a^i < i • |V^| for i < k; 
Xi — +00 for i > k. 

Let us go further. We can write p — a(3p such that 

Visit(a) = Visit(p) 

Last(a) = last{af3) 

|a/3| < (fc + 1) • \V\ < (n + 1) • \V\. 

Indeed, the prefix ft, of p of length (fc + 1) • \V\ visits each goal set Goali, with i < k, and after 
the last visited Goalfc, there remains enough vertices to observe a cycle. Notice that Visit(a) ~ 
Visit(a/3) = Visit(p) (= Type((cr,)<ei7)). 

If we define the strategy profile (Ti)ig/7 like in Lemma BUI we get a Nash equilibrium in Q 
with outcome and the same type as {(Jijazn- n 

We are now ready to prove Theorem 1181 

Proof (of Theorem \18\) . Let us set U = {1, . . . , n}. Let {(Ji)ien be a Nash equilibrium in the 
game Q. The first step consists in constructing a Nash equilibrium as in Proposition[5TJ Let us 
denote it again by {(Ji)i^n- Let us set p = {{ai)i^n) = Oif5^ such that Visit(a) = Type((cri)ig77) 
and |a/3| < (n + 1) • |V^|. The strategy profile {(Ji)i^n is also a Nash equilibrium in the game T 
played on the unraveling T of G. 

For the second step we consider TrunCrf(T) the truncated tree of T of depth d — {n + 2)-\V\. 
It is clear that the strategy profile {(Ti)i£n limited to this tree is also a Nash equilibrium 
of TrunC(j(T). 

We know that |q;/3| < (n+lj-jyl and we set 7 such that a/37 is a prefix of p and \al3^\ = {n+ 
2) ■ \ V\. Furthermore we have Last(a) = Last(a/3) and Visit(a) = Visit(a/37) (since Visit(a) = 
Type(p)). Then this prefix a/37 satisfies the properties described in Lemma [T5] (bv setting I = 
n + 1). By Lemma 1161 we conclude that there exists a Nash equilibrium {Ti)i^n with finite 
memory such that Type((Ti)igi7) = Visit(a), that is, with the same type as the initial Nash 
equilibrium [dijien- □ 

4 Secure Equilibria 

In the previous section, we positively solved Problem 1 and Problem 2 for Nash equilibria. 
We here solve these two problems for secure equilibria, but in two-player games only. The 
main results are stated in Theorems [22] and [28] below. In this section, we exclusively consider 
two-player games. 

Theorem 22. In every quantitative two-player reachability game, there exists a finite-memory 
secure equilibrium. 



The proof of Theorem [52] is based on the same ideas as for the proof of Theorem [TUl 
(existence of a Nash equihbrium). By Kuhn's theorem (Theorem [TT]), there exists a secure 
equihbrium in the game TrunCti(T) played on the finite tree TrunCd(T), for any depth d. By 
choosing an adequate depth d, Proposition [25] enables to extend this secure equilibrium to a 
secure equilibrium in the infinite tree T, and thus in Q. 

The notion of secure equilibrium is based on the binary relations of Definition[3] One can 
easily see that -<j is not refiexive. To be able to apply Kuhn's theorem, it is more convenient to 
define secure equilibria via a preference relation. Given two cost profiles {xi,X2) and (2/1,2/2): 

{xi,X2) :<j {yi,y2) iff (a;i, 2:2) (2/1, y2) V (a;i = yi A a;2 = ^2) ■ 

The relation is clearly a preference relatioE0. We can now provide an equivalent definition 
of secure equilibrium. 

Proposition 23. A strategy profile ((Ji,cr2) of a game Q is a secure equilibrium iff for all 
strategies a[ of player 1 in Q , we have: 

Cost(p') Cost(p) 

where p = {<J 1,(72) and p' = (cr^,f72), and symmetrically for all strategies a'2 of player 2. 

Since '^i and ^^^2 are preference relations, we get the next corollary by Kuhn's theorem. 

Corollary 24. Let Q he a quantitative two-player reachability game and T he the unraveling 
of G. Let TrunCc;(T) be the game played on the truncated tree of T of depth d, with d > 0. 
Then there exists a secure equilibrium in Trunc£;(7~). 

Now that we can guarantee the existence of secure equilibrium in finite trees, it remains 
to show how to lift them to infinite trees. The next proposition states that it is possible to 
extend a secure equilibrium in Trunc£;(7") to a secure equilibrium in the game T with the same 
type, if the depth d is greater or equal to (|i7| + 1) • 2 ■ and there are only two players. It 
also says that we can construct a secure equilibrium in TrunC(j(T) from a secure equilibrium 
in T, while keeping the same type. 

Proposition 25. Let Q be a two-player game and T be the unraveling of G. 

(i) If there exists a secure equilibrium of a certain type in the game T , then there exists a secure 
equilibrium of the same type in the game TrunCc;(T), for some depth d > (|7T| + 1) • 2 ■ 

{ii) If there exists a secure equilibrium of a certain type in the game Truncd(7~), where d > 
(|i7| + l)-2-|y|, then there exists a finite-memory secure equilibrium of the same type in 
the game T. 

To prove Proposition [25] we need the following technical lemma whose hypotheses are the 
same as in Lemma [T5] Recall that Lemma [TS] states that for all j € II such that a does not 
visit Goalj, the players i ^ j can play together to prevent player j from reaching his goal 
set Goalj from any history hu consistent with {<yi)ien\{j} a-nd such that \hu\ < \al3\. We 
denote by v-j the memoryless winning strategy of the coalition, and for each player i ^ j, Vi.j 
the memoryless strategy of player i in Q induced by . 

^ Remark that is a kind of lexicographic order on (N U {+00}) x (N U {+00}). 



Lemma 26. Suppose d > 0. Let (cri, (72) be a secure equilibrium in Truncd(7~) and p — (cri, (T2) 
its outcome. Assume that p has a prefix a/Sj, where j3 contains at least one vertex, such that 

Visit(a) = Visit(a^7) 
Last(a) = Last(a/3) 

|a/3| < / • |V^| 
\aP^\^{l + l)-\V\ 

for some 1>1. Then we have 

(Visit(a) ^ V V\s\t{p) ^ {1, 2}) Visit(a) = Visit(p). 

In particular, Lemma [26l implies that if a visits none of the goal sets, then p visits either 
both goal sets or none. Notice that in the case of Nash equilibria, we can have situations con- 
tradicting Lemma [26l and in particular the previous situation, as it can be seen in Example 1 171 

Proof. By contradiction, assume that 2 G Visit(p) \ Visit(a) (the case where 1 £ Visit(p) \ 
Visit(a) is symmetric). The hypothesis implies that 1 G Visit(a) or 1 ^ \/\s\t{p). 

By Lemma [TSl player 1 wins the game Q2 from Last(a), that is, has a memoryless winning 
strategy vi_2 from this vertex. Then if player 1 plays according to ui until depth |a|, and 
then switches to from Last(a), this strategy is a ^i-profitable deviation for player 1 
w.r.t. ((Ti,(T2). Indeed, if 1 G Visit(a), player 1 manages to increase player 2's cost while 
keeping his own cost. On the other hand, if 1 Visit(p), either player 1 succeeds in reaching 
his goal set (i.e. strictly decreasing his cost), or he does not reach it (then gets the same cost 
as in p) but succeeds in increasing player 2's cost. Thus we get a contradiction. □ 

We can now give the proof of Proposition [511 The idea for showing case [i) is to look at 
the play tt of the secure equilibrium in T and consider the depth d needed to visit all the goal 
sets of the players in Visit(7r). Then, the secure equilibrium in Truncd(7') is defined exactly as 
the secure equilibrium of T . 

The proof of case [ii) works pretty much as the one of Proposition [T3] (whereas the latter 
proposition does not preserve the type of the Nash equilibrium). Thanks to Lemma the 
proof reduces into only two cases depending on when the goal sets are visited. In the most 
interesting case, a well-chosen prefix a/3, with (3 being a cycle, is first extracted from the 
outcome /9 of the secure equilibrium (cri,CT2) of TrunCd(T). The outcome of the required secure 
equilibrium of T will be equal to As soon as a player deviates from this play, the other 
player punishes him, but the way to define the punishment is here more involved than in the 
proof of Proposition [131 In the other case, the proof is simpler, but the ideas are quite the 
same. 

Before entering the details, let us introduce a notation. For any play p = popi ... of ^ and 
any player i G U, we define \r\dexi{p) as the least index I such that pi G Goal^ if it exists, or 
-1 if notll. 

Proof (of Provosition \25\) . First let us begin with the proof of (i). Suppose that there exists 
a secure equilibrium (ti,T2) in T and that the play tt is the outcome of this strategy profile. 
Let us set d := max{(|i7| -t- 1) • 2 • \V\, lndexi(7r), Index2(7r)} and define ((Ti,(T2) as the strategy 
profile in TrunCci(T) corresponding to the strategies (n, T2) restricted to the finite tree. Clearly 



We are conscious that it is counterintuitive to use the particular value —1, but it is helpful in the 
proofs. 



the outcome p of (ui, a^) is a prefix of tt and Visit(p) = Visit(7r), so (cti, (T2) and (n, T2) are of 
the same type. It remains to show that (tTi,(T2) is a secure equihbrium in TrunCc;(T). 

Assume by contradiction that player 1 has a ^i-profitable deviation a'l w.r.t. (cti, (T2) (the 
case of player 2 is symmetric). We write p' for the outcome of (crj,cr2) in TrunCd(T). There 
are two cases to consider: either player 1 manages to decrease his cost in p' w.r.t. p, or he 
pays the same cost as in p but he is able to increase the cost of player 2 in p' w.r.t. p. In 
both cases, if player 1 plays according to (I'l in 7" until depth d and then arbitrarily, one can 
easily be convinced that we get a -<i -profitable deviatioro w.r.t. (Ti,r2) in T.This leads to a 
contradiction. 

Now let us proceed to the proof of {ii). Let (cri,cr2) be a secure equilibrium in the 
game Truncd(T), where d > (|i7| + 1) • 2 • \V\, and p its outcome. We define the prefixes pq 
and a/37 as in the proof of Proposition [T3] (see Figure [3]) . 

By Lemma [26l there are only two cases to consider: 

(a) Visit(Q:) = and Visit(p) = {1,2}; 

(b) Visit(a) = Visit(p). 

We define a different secure equilibrium according to the case. 

Let us start with case (a): Visit(Q!) = and \/\s\t{p) = {1,2}. We define the following 
strategy profile: 



where i — 1,2, and arbitrary means that the next vertex is chosen arbitrarily, but in a memo- 
ryless way. Note that the outcome of (ti, T2) is of the form a'(/3')" where Visit(a') — Visit(p) = 
{1,2} and /3' is a cycle. So, {ti,T2) has the same type as (cri,cr2)- It remains to prove that 
(ti,T2) is a finite-memory secure equilibrium in T. 

Assume by contradiction that player 1 has a ^i-profitable deviation r{ w.r.t. (ti,T2) in T 
(the case for player 2 is symmetric). The strategy a'l equal to t{ in TrunCci(T) is clearly a ^1- 
profitable deviation w.r.t. (cti,(T2), which is a contradiction with the fact that ((Ti,CT2) is a 
secure equilibrium in Truncd(7~). Moreover, as done in the proof of Lemma I16[ (ti,T2) is a 
finite-memory strategy profile. 

Now we consider case (b): Visit(a) — Visit(p). Like in the proof of Lemma \W\ we consider 
the infinite play af3'^ in the game T- The basic idea of the strategy profile {ti,T2) is the 
same as for the Nash equilibrium case: player 2 (resp. 1) plays according to af3'^ and punishes 
player 1 (resp. 2) if he deviates from a/3", in the following way. Suppose that player 1 deviates 
(the case for player 2 is similar). Then player 2 plays according to (T2 until depth \a\, and after 
that, he plays arbitrarily if a visits Goali, otherwise he plays according to 1^2,1- 

We define the same punishment function P as in the proof of Lemma Uni for vq , we define 
P{'"a) ~ -L and for h G such that Last(/i) G Vi and w G t^, we let: 




ai{h) if 1^1 < mj 
arbitrary otherwise. 




P{hv) 



_L if P{h) = _L and hv < ajS'^ , 
i ii P{h) ^ ± and hvft aP'^ , 
P{h) otherwise {P{h) ^ _L). 



^ Notice that in the second case, when p does not visit Goali in TrunCd(T), player 1 may reach his 
goal set in T when deviating in this way, and this would be profitable for him in this game. 



The definition of tlie secure equilibrium (ri, T2) is as follows: ioi h G H such that Last(/i) G 

V,: 

V if P{h) = ± {h< such that hv < a/3", 

arbitrary if P{h) = i, 

,{h) = <^ (7,{h) if P{h) ^ ±, i and \h\ < \al 

^i.p(h){h) if P{h) ^ \h\ > \a\ and a does not visit Goalp(;i), 

arbitrary otherwise {P{h) =^ -L,i, \h\ > |q;| and a visits Goalp(;i)) 

where i — 1,2, and arbitrary means that the next vertex is chosen arbitrarily (in a memo- 
ryless way). Clearly the outcome of {ti,T2) is the play aP'^ , and the type of {ti,T2) is equal 
to Visit(Q:) = Visit(p), the type of (cti, 0-2). Moreover, as done in the proof of Lemma fTHl (n, ^2) 
is a finite-memory strategy profile. 

Remark that the definition of the strategy profile {ti,T2) is a little different from the one 
in the proof of Lemma [TBI because here, if player 1 deviates (for example), then player 2 has to 
prevent him from reaching his goal set Goali (faster), or having the same cost but succeeding 
in increasing player 2's cost. 

It remains to show that (ti,T2) is a secure equilibrium in the game T- Assume by con- 
tradiction that there exists a ^i-profitable deviation r{ for player 1 w.r.t. {ti,T2). The case 
of a ^2-profitable deviation for player 2 is similar. We construct a play p' in TrunCd{T) as 
follows: player 1 plays according to the strategy t[ restricted to TrunCd(T) (denoted by a[) 
and player 2 plays according to a2- Thus the play p' coincide with the play tt' = (t{,T2) at 
least until depth \a\ (by definition of T2); it can differ afterwards. We have: 



P = 

P' = 

TT — 

tt' = 



CTl,cr2; 

>1,T2) 
T{,r2) 



of cost profile (xi , X2 ) 

of cost profile {x'l , x'2 ) 

of cost profile {yi,y2) 

of cost profile {y[,y'2). 



The situation is depicted in Figure [TUl 




(1/1,1/2) 



Fig. 10. Plays p and vr, and tlieir respective deviations p' and tt'. 



By contradiction, we assumed that r{ is a ^i-profitable deviation for player 1 w.r.t. (ti, T2), 
i.e. (1/1,2/2) ^1 (2/112/2)- Now we are going to show that (a::i,X2) ^1 {x[,x'2), meaning that a'l 
is a -<i- profit able deviation for player 1 w.r.t. (o'i,o'2) in Truncd(T). This will lead to the 
contradiction. As r{ is a ^i-profitable deviation w.r.t. (ti, T2), one of the following three cases 
stands. 

(1) 2/i < 2/1 < +00. 

As TT = a/?", it means that a visits Fi, and then; 

y'l < yi ^ xi < \a\. 

As 2/'i < I a I, we have x'l — y'l (as p' and tt' coincide until depth \a\). Therefore x'l < xi, 
and (a;i,a;2) -<i {x[,x'2). 

(2) 2/i < 2/1 = +00- 

If 2/1 ^ l*^!? have a;']^ = y[ (by the same argument as before). As Visit(a) ~ Visit(p), we 
have xi = yi — +00 and x'l < xi (and so (a;i,a;2) -<i {x'i,x'2))- 

We show that the case y[ > \a\ is impossible. By definition of T2 the play tt' is consistent 
with (72 until depth \a\, and then with 1^2,1 (^s 2/1 — +00). By Lemma[T5]the play tt' can 
not visit Goali after a depth > |a|. 

(3) yi = y'l and 2/2 < 2/2- 

Note that this implies 2/2 < +00 and X2 — 2/2 (as tt = otjS'^). Since p' and tt' coincide until 
depth \a\, 2/2 < 2/2 ^'^d X2 — y2 ^ we have 

a;2 = 2/2 < 

showing that the cost of player 2 is increased. In order to ensure that (j[ is a -(i-profitable 
deviation, it remains to show that cither player 1 keeps the same cost, or he decreases his 
cost. 

If y'l — yi < +00, it follows as in the first case that: 

yi^xi < \a\ and x[ = y[. 

Therefore xi = x'^, i.e. player 1 has the same cost in p and p' . And so, (a;i, a;2) "<i {x'l, x'2). 
On the contrary, if y'l — yi — +00, it follows that xi — +00 (as Visit(Q:) = Visit(p)). 
And so, we have that x'^ < +00 = xi, or x'^ = xi. But in both cases, it holds that 
(a;i,a;2) -<i ix[,x'2). 

In conclusion, we constructed a ^i-profitable deviation a'l w.r.t. (CTi,cr2) in TrunCd(T), and 
then we get a contradiction. □ 

Remark 27. Let us notice that in case (i) of Proposition [53 the proof remains valid if we 
take d = max{0, lndexi(7r), Index2(7r)}. Thus, in the statement of case (i), the constraint d > 
(|7T| + 1) • 2 • |F| can be replaced by d > 0. 

We can now proceed to the proof of Theorem [22l 

Proof (of Theorem Let us set d := (77 + 1) • 2 • \V\ and apply Corollary [H on the 
game TrunCd(T). Then we get a secure equilibrium in this game. By Proposition [25l there 
exists va Q a, finite-memory secure equilibrium with the same type. □ 

Theorem [55] positively answers to Problem[l]for secure equilibria in two-player games. The 
next theorem solves Problem [5] for the same kind of games. 



Theorem 28. // there exists a secure equilibrium in a quantitative two-player reachability 
game Q, then there exists a finite-memory secure equilibrium of the same type in Q. 

Proof. Let (cri,CT2) be a secure equilibrium in Q. By the first part of Proposition I25[ there 
exists a secure equilibrium of the same type in the game TrunCd(T), for a certain depth d > 
(77 + 1) • 2 • If we apply the second part of Proposition [25l we get a finite- memory secure 
equilibrium of the same type as (cri,cr2) in □ 

The proof of Theorem [25] is based on Proposition [?5] which, roughly speaking, ensures that 
every secure equilibrium of Truncrf(7~) can be lifted to a secure equilibrium of the same type 
in 7", and conversely. Notice that Proposition [25] has no counterpart for Nash equilibria, since 
we can not guarantee that the type can be preserved, as it can be seen from Example II 71 This 
approach makes the proof of Theorem [28] rather different than the proof of Theorem [18] 

Notice that Proposition [55] stands for two-player games because its proof uses Lemma 
that has been proved only for two players. 

5 Extensions of the Model 
5.1 Safety Objectives 

Let us now consider quantitative games played on a graph where some players have reachability 
objectives, whereas others have safety objectives. As previously, the players with reachability 
objectives want to reach their goal set as soon as possible. The players with safety objectives 
want to avoid their bad set or, if impossible, delay its visit as long as possible. Let us make 
that precise through the following definition. 

Definition 29. An infinite turn-based quantitative multiplayer reachability/safety game is a 
tuple g = (77, 7Tr,77s,y, {Vi)ien,VQ,E, {Qoa\i),^n,, (BadOie/iJ where 

• n is a finite set of players partitioned into Ur and lis which are the players with reacha- 
bility and safety objectives respectively, 

• G — (V, (T^i)ig77, uo, 7?) is a finite directed graph where V is the set of vertices, (T^i)ig7i 
is a partition of V into the state sets of each player, vq ^ V is the initial vertex, and 
E (IV xV is the set of edges, and 

• Goali C y is the goal set of player i, for i G 77^; Bad^ C y is the had set of player i, for 
I e 77,. 

For any play p — poPi • ■ • of Q, we note Costi(p) the cost of player i. For i G 77^ the cost 
is defined as before and for i G 77, the cost is defined by: 



As before, the aim of each player i is to minimize his cost, i.e. reach his goal set Goal^ as soon 
as possible for i G 77^, or delay the visit of Badi as long as possible for i G 77,. The notions 
of play, strategy, outcome and Nash equilibrium extend in a natural way. The main result of 
this subsection is the following theorem which solves Problem [T] in this framework. 

Theorem 30. In every quantitative multiplayer reachability /safety game, there exists a finite- 
memory Nash equilibrium. 




I if Z is the least index such that pi G Bad.; 
oo otherwise. 



In order to prove Theorem[3Dl we have to revisit the resuhs of Section [31 Let us first notice 
that Lemma [15] remains true in this framework when player j belongs to Ur- Lemma [T6| 
remains true, however we have to slightly adapt its proof. 

Proof ( of Lemma \16\ in the case of reachability/safety objectives). 

Let us first introduce some notations. In the rest of the proof, we denote by 7T/ (resp. 7T/) 
the subset of players i G Ur (resp. i G Ug) such that a visits Goali (resp. Badi) and by 77^ 
(resp. n^) the set 77^ \ 77/ (resp. 77, \ 77/). 

The punishment function P is defined exactly as in the proof of Lemma 1161 For vo, we 
define P{vo) = _L and for h G V'^ such that Last(ft,) G Vi and w G we let: 

( _L if P{h) = _L and hv < afi'^ , 
P{hv) ^ li if P(/i) = _L and hv ^ a^'^ , 
\ P{h) otherwise {P{h) ^ _L) . 

The difl^erence with the proof of Lemma [THj arises in the definition of the Nash equilib- 
rium (Ti)igj7- The new equilibrium needs to incorporate an adequate punishment for the 
players with safety objectives. More precisely, in order to dissuade a player j G 7J/ from devi- 
ating, the other players punish him by playing the strategies io'i)i£n\{j} in TrunCd(T). Notice 
that a player j G 7T^ has no incentive to deviate. Formally we define the Nash equilibrium 
{Ti)ien as follows. For h € H such that Last(/i) G Vi, 

V if P{h) = _L (ft.< a/3"); such that hv < a/3", 

arbitrary if P{h) — i, 
^{h) = { v,,P(h)(h) if P{,h) ^ ±,i and P{h) G 71^, (6) 
ai{h) if P{h) ^ _L, i, P{h) G 77/ U 77/ and \h\ < d, 
arbitrary otherwise, 

where arbitrary means that the next vertex is chosen arbitrarily (in a memoryless way). Clearly 
the outcome of {Ti)i^n is the play a/3", and Type((rj)jg/7) is equal to Visit(a) (= Visit(a/3)). 

It remains to prove that (Ti)ig/7 is a finite-memory Nash equilibrium in the game T. In 
order to do so, we prove that none of the players has a profitable deviation. For players with 
reachability objectives, the arguments are exactly the same as the ones provided in the proof 
of Lemma [TBI Let us now consider players with safety objectives. In the case where j G 77^, 
player j has clearly no incentive to deviate. In the case where j G 77/, to decrease his cost, 
player j has no incentive to deviate after the prefix a. Thus we assume that the strategy rj 
causes a deviation from a vertex visited in a. By Equation ([6]) the other players first play 
according to {cri)ien\{j} in Truncrf(7~), and then in an arbitrary way. 

Suppose that rj is a profitable deviation for player j w.r.t. {Ti)i^n in the game T. Let us 
set TT = {{Ti)i(zn) and tt' = (rj, {Ti)ifzn\{j}) ■ Then 

Costj(7r') < Costj(7r). 

On the other hand we know that 

Costj(7r) — Cost j{p) < \a\. 
So if we limit the play tt' in T to its prefix of length d, we get a play p' in Truncd(7') such that 



Costj{p') < Costj(7r') < Costj(p). 



Notice that we do not necessarily have that Costj (p') = Costj (tt') (as in the proof of Lemnia [T5|) 
since the bad set Badj can be visited by tt' and not by p'. As the play p' is consistent with 
the strategies {cri)iszn\{j} by Equation the strategy rj restricted to the tree Truncd(r) is 
a profitable deviation for player j w.r.t. {(Ti)i^n in the game TnjnCci(7'). This is impossible. 
Moreover, as done in the proof of Lemma [T6l (ti, T2) is a finite-memory strategy profile. 

□ 

Since Lemma [TBI holds in the context of reachability /safety objectives, Proposition [T3l 
ensm-es that the equilibrium in TrunCc;(T) provided by Kuhn's theorem (Corollary [12]) can be 
lifted to T. This proves Theorem [5(11 



5.2 Tuples of Costs on Edges 

In this subsection, we come back to a pure reachability framework and we extend our model 
in the following way: we assume that edges are labelled with tuples of positive costs (one cost 
for each player). Here we do not only count the number of edges to reach the goal of a player, 
but we sum up his costs along the path until his goal is reached. His aim is still to minimize 
his global cost for a play. We generalize Definition [T] 

Definition 31. An infinite turn-based quantitative multiplayer reachability game with tuples 
of costs on edges is a tuple Q = (77, V, (Vi)i(zn ,vo, E, (Costi)ig77, (Goali)ig77) where 

• n is a finite set of players, 

• G = (V, (Vi)ig77, uqi a finite directed graph where V is the set of vertices, (Vi)igj7 
is a partition of V into the state sets of each player, vq lE V is the initial vertex, and 
E Q V X V is the set of edges, 

• Costi : E — > M^*^ is the cost function of player i defined on the edges of the graph, 

• Goali C_V is the goal set of player i. 

We also positively solve Problem [T] for Nash equilibria in this context. 

Tiieorem 32. In every quantitative multiplayer reachability game with tuples of costs on 
edges, there exists a finite-memory Nash equilibrium. 

To prove Theorem I32[ we follow the same scheme as in Section [3l In particular, we rely 
on Kuhn's theorem (Corollary [T^ and need to prove a counterpart of Lemma [13 Lemma [TBI 
and Proposition [131 in this framework. 

Let us first introduce some notations that will be useful in this context. We define Cmin 
minjg/7 minegB Costi(e), Cmax ■— max^gij maxggB Costi(e) and K := 

Cmin,Cmax > and K > 1. 



It is clear that 



We also adapt the definition of Costi(p), the cost of player i for a play p = poPi ■ 

{k=l 
Costi((pfe_i, pfc)) if / is the least index such that pi G Goali, 
fc=i 
+00 otherwise. 



The counterpart of Lemma [THI is the following one, taking into account the constant K 
defined before. 



Lemma 33. Suppose d > 0. Let (ai)i,z]j be a Nash equilibrium in Truncd(7~) and p the out- 
come of {ai)i^n- Assume that p has a prefix af3j, where f3 contains at least one vertex, such 
that 

Visit(Q!) Visit(a^7) 
Last(a) = Last(Q;/3) 
\al3\ <l-\V\ 
|a/37| = {l + K)-\V\ 

for some I >1. 

Let j £ n be such that a does not visit Goalj. Consider the qualitative two-player zero-sum 
game Qj = {V,Vj,V \ Vj,E, Goalj). Then for all histories hu of Q consistent with i<Ji)i£n\{j} 
and such that hu < a/3, the coalition of the players i ^ j wins the game Gj from u. 

Proof (Sketch). As for the proof of Lemma [T2] we proceed by contradiction and define a play 
p' in the very same way. We can deduce that 

Indexj(p') < \hu\ + \V\ (by Proposition [T4)l 

<{l + l)-\V\ (by hypothesis) 

<{l + K)-\V\ (asK>l) 

< d (as a/37 < p). 

The case where Costj(p) = +cxd is solved in the same way. For the other case Costj(p) < 
+CXD, we note Cj[hu) the sum of the costs of player j along the prefix hu. We have the following 
inequalities (see Figure fTT|) : 

CoStj(p') < Cj(hu) + Cmax ' l^^l 

Costj(p) > Cj(ft,u) + Cmin ■ K ■ (as IpdeXj (p) > (/ + K) ■ | V|) 

> Cj{hu) + Cmin ■ SlU^ . |\/| (by definition of K) 

Cmin 

= Cjihu) + Cmax ' \V\ . 

Then we have Costj(p') < Cost j{p), and since p' is consistent with the strategy 

of player j induced by the play p' is a profitable deviation for player j w.r.t. {(Jijien- This 
contradicts the fact that {(7i)i^n is a Nash equilibrium in the game TrunCd{T). □ 

The following lemma is the counterpart of Lemma 1161 

Lemma 34. Suppose d > 0. Let {ai)i^n be a Nash equilibrium in Truncd(7~) and be a 

prefix of p ^ {{(^i)ien) o.s defined in Lemma where \aP^\ = (/ + K) • \V\ for some I > 1 
such that I < ly-'l^^ + 1. 

Then there exists a Nash equilibrium {riji^u in the game T. Moreover {riji^u is finite- 
memory, and lY^e{{Ti)i^n) = Visit(a). 

Proof. We prove this result in the very same way as Lemma Uni The only difference lies in 
the cas43 j < k when we show that {Ti)i^n is a Nash equilibrium. We suppose that rj is a 
profitable deviation for player j w.r.t. {Ti)i^n in the game T. So we have Costj(7r') < Costj (7r), 



Indeed when j > k, i.e. when player j has not reached his goal set, the coalition punishes him in 
the exact same way as Lemma [TS] by preventing him from visiting his goal set. 



Fig. 11. Plays p and p' with their common prefix hu. 



where tt = {{Ti)i^n) and tt' ~ (rj, iTi)i^n\{j}) ■ As lndexj(7r) < \a\, we know that Costj(7r) < 
\ct\ ■ Cmax- It foUows that Costj(7r') < la] • Cmax and 

lndexj(7r') < \a\ ■ 

< a-i)-i^i-K 

< d (by hypothesis). 

The first inequality can be justified as follows. For a contradiction, let us assume that I ndexj (tt') > 
|q,| . It follows that Costj(7r') > Cmin ' I^I ' 7^, this contradicts the fact that Costj(7r') < 

As in the proof of Lemma 1161 we limit the play tt' in T to its prefix of length d and get a 
profitable deviation for player j w.r.t. {cri)i^n in the game TrunCd{T), contradicting the fact 
that {(Ji)i(zn is a Nash equilibrium in Truncrf(7"). 

Moreover, as done in the proof of Lemma [TBI (ri , T2) is a finite-memory strategy profile. □ 

As a consequence of the two previous lemmas, Proposition [13] remains true in this context, 
we only have to adjust the depth d of the finite tree. 

Proposition 35. Let Q be a game and T he the unraveling of G. Let Truncrf(7~) be the game 
played on the truncated tree ofT of depth d = max{(|7T| + l)-(K+l)-|y|, (|i7KK+l) + l)-|y|-K}. 
// there exists a Nash equilibrium in the game TrunCc((7~), then there exists a finite-memory 
Nash equilibrium in the game T ■ 

Proof. The proof is similar to the proof of Proposition 1131 Let {(Ji)ien be a Nash equilibrium 
in the game TrunCti(T) and p its outcome. We consider the prefix pq of p of minimal length 
such that 

3i>i |p| = (;-i).|y| 

|pq| = (/ + K).|l-| 
Visit(p) = Visit(pq). 

In the worst case, the play p visits the goal set of a new player in each prefix of length i • (K + 
1) • \Vl l<i<\n\, i.e. IpI = \n\ • (K + l) • \V\. So we know that ^ < |7T| • (K + 1) + 1 and pq 



exists as a prefix of p, because tfie lengtli c? of p is greater or equal to (|77| + 1) • (K + 1) • \ V\ 
by hypothesis. 

Given the length of q (K > 1), one vertex of V is visited at least twice by q. More precisely, 
we can write 

pq = a/?7 with Last(a) = Last(a/J) 

\a\>il-l)-\V\ 
\al3]<l.\V\. 

We have Visit(a) = Visit(a^7), and \al3-f\ = (/ + K) • \V\. 
Moreover, the following inequality holds: 

d>{\n\-{K + l) + l)-\V\-K>l-\V\-K and so, l< 



\V\-K- 

Then, we can apply Lemma [Ml smd get a finite- memory Nash equilibrium {Ti)i^]j in the 
game T such that Type((Ti)ig/7) Visit(a). □ 

Thanks to Corollary [T^ and Proposition 1551 one can easily deduce Theorem [5^ 
Let us comment on the depth d chosen in Proposition 1351 It is defined as the maximum 
between di := (|7T| + 1) • (K + 1) ■ \V\ and da := {\n\ ■ (K + 1) + 1) ■ \V\ ■ K. One can easily 
prove that di < ^2 if and only if > ^pfp- 

We now investigate an alternative method to handle simple cost functions. More precisely 
we only consider cost functions {Costi)i^n such that for all i, j € 11 we have that Costi = Costj 
and Costi : E ^ Nq. In other words, it means that there is a unique non-zero natural cost on 
every edge. Later on we are going to compare the depths of the finite trees obtained by the 
two methods. 

In the case of these simple cost functions, we can directly deduce Theorem [32] by replacing 
any edge of cost c by a path of length c composed of c new edges (of cost 1) and then applying 
the results of Section[2|on this new game. If we write G' = {n, V , {V-)iQn,vo, E' , {Goa\i)i^n) 
the new game obtained by adding new vertices and edges when necessary, it holds that: 

l^'l < \V\ + {c^,^-l)-\E\ 

< |y| + (c,ax-l)-|^|',and 
\E'\<c^,,-\E\. 

If we apply Proposition [131 the depth d' of the finite tree that is considered satisfies: 

d'==(|7T| + l)-2-|T/'| 

< (|7T| + l)-2-(|F| + (c^ax-l)-|-E|) 
<(|iT| + l).2.(|V| + (c^,,-l).|Fp) . 

Whereas if we apply Proposition [35] directly on the initial game Q, we have the following 
equality: 

d = max{(|i7| + 1) • (K + 1) • \V\, (|i7| • (K + 1) + 1) • \V\ ■ K} . 

Let us first notice that if all the edges of Q are labelled with the same cost (i.e., Cmax — Cmin 
and K = 1), then 



d' = i\n\ + 1) • 2 • i\V\ + (c^ax - 1) • \E\), and 
d =(|iT| + l).2.|y|. 



And so, 



if Cmax = Cfnin = 1, thcn d' = d = (|7T| + 1) • 2 • \V\, and 

if Cmax = Cmin > 1, thcn d' > d . 

When K > 1, the comparison between d and c?' depends on the values of many parameters 
of the game. For example, if the graph of the game has five vertices, three edges of cost 1 
and one edge of cost 100, then it is more interesting to use the game G' and techniques from 
Section [3] to construct the Nash equilibrium, because in this case, d' = (7T + 1) • 2 ■ 104 and 
d = ■ 101 + 1) ■ 5 • 101, and so d » d'. 

6 Conclusion and Perspectives 

In this paper, we first prove the existence of finite-memory Nash equilibria for quantitative 
multiplayer reachability games played on finite graphs. We also prove that this result remains 
true when the model is enriched by allowing n-tuples of non-negative costs on edges (one cost 
by player), answering a question we posed in [3|. Moreover we extend our existence result to 
quantitative games where both safety and reachability objectives coexist. Secondly, we prove 
the existence of finite-memory secure equilibria for quantitative two-player reachability games 
played on finite graphs. 

There are several interesting directions for further research. First, we intend to investigate 
the existence of secure equilibria in the n-player framework. Notice that the proof techniques 
related to our results on secure equilibria rely on the two-player assumption. Furthermore, 
we also want to investigate deeper the size of the memory needed in the equilibria. This 
could be a first step towards a study of the complexity of computing equilibria with certain 
requirements, in the spirit of [9j. We also intend to look for existence results for suhgame 
perfect equilibria. Finally we would like to address these questions for other objectives such as 
Biichi or request-response. 
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